Search CORE

53 research outputs found

Learning Equivalence Classes of Bayesian Networks Structures

Author: Chickering David Maxwell
Publication venue
Publication date: 13/02/2013
Field of study

Approaches to learning Bayesian networks from data typically combine a scoring function with a heuristic search procedure. Given a Bayesian network structure, many of the scoring functions derived in the literature return a score for the entire equivalence class to which the structure belongs. When using such a scoring function, it is appropriate for the heuristic search algorithm to search over equivalence classes of Bayesian networks as opposed to individual structures. We present the general formulation of a search space for which the states of the search correspond to equivalence classes of structures. Using this space, any one of a number of heuristic search algorithms can easily be applied. We compare greedy search performance in the proposed search space to greedy search performance in a search space for which the states correspond to individual Bayesian network structures.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996

arXiv.org e-Print Archive

A Transformational Characterization of Equivalent Bayesian Network Structures

Author: Chickering David Maxwell
Publication venue
Publication date: 20/02/2013
Field of study

We present a simple characterization of equivalent Bayesian network structures based on local transformations. The significance of the characterization is twofold. First, we are able to easily prove several new invariant properties of theoretical interest for equivalent structures. Second, we use the characterization to derive an efficient algorithm that identifies all of the compelled edges in a structure. Compelled edge identification is of particular importance for learning Bayesian network structures from data because these edges indicate causal relationships when certain assumptions hold.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI1995

arXiv.org e-Print Archive

Fast Learning from Sparse Data

Author: Chickering David Maxwell
Heckerman David
Publication venue
Publication date: 16/05/2015
Field of study

We describe two techniques that significantly improve the running time of several standard machine-learning algorithms when data is sparse. The first technique is an algorithm that effeciently extracts one-way and two-way counts--either real or expected-- from discrete data. Extracting such counts is a fundamental step in learning algorithms for constructing a variety of models including decision trees, decision graphs, Bayesian networks, and naive-Bayes clustering models. The second technique is an algorithm that efficiently performs the E-step of the EM algorithm (i.e. inference) when applied to a naive-Bayes clustering model. Using real-world data sets, we demonstrate a dramatic decrease in running time for algorithms that incorporate these techniques.Comment: Appears in Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI1999

arXiv.org e-Print Archive

A Decision Theoretic Approach to Targeted Advertising

Author: Chickering David Maxwell
Heckerman David
Publication venue
Publication date: 16/01/2013
Field of study

A simple advertising strategy that can be used to help increase sales of a product is to mail out special offers to selected potential customers. Because there is a cost associated with sending each offer, the optimal mailing strategy depends on both the benefit obtained from a purchase and how the offer affects the buying behavior of the customers. In this paper, we describe two methods for partitioning the potential customers into groups, and show how to perform a simple cost-benefit analysis to decide which, if any, of the groups should be targeted. In particular, we consider two decision-tree learning algorithms. The first is an "off the shelf" algorithm used to model the probability that groups of customers will buy the product. The second is a new algorithm that is similar to the first, except that for each group, it explicitly models the probability of purchase under the two mailing scenarios: (1) the mail is sent to members of that group and (2) the mail is not sent to members of that group. Using data from a real-world advertising experiment, we compare the algorithms to each other and to a naive mail-to-all strategy.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000

arXiv.org e-Print Archive

Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network

Author: Chickering David Maxwell
Heckerman David
Publication venue
Publication date: 16/05/2015
Field of study

We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that the CS measure is the most accurate.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996

arXiv.org e-Print Archive

Finding Optimal Bayesian Networks

Author: Chickering David Maxwell
Meek Christopher
Publication venue
Publication date: 12/12/2012
Field of study

In this paper, we derive optimality results for greedy Bayesian-network search algorithms that perform single-edge modifications at each step and use asymptotically consistent scoring criteria. Our results extend those of Meek (1997) and Chickering (2002), who demonstrate that in the limit of large datasets, if the generative distribution is perfect with respect to a DAG defined over the observable variables, such search algorithms will identify this optimal (i.e. generative) DAG model. We relax their assumption about the generative distribution, and assume only that this distribution satisfies the {em composition property} over the observable variables, which is a more realistic assumption for real domains. Under this assumption, we guarantee that the search algorithms identify an {em inclusion-optimal} model; that is, a model that (1) contains the generative distribution and (2) has no sub-model that contains this distribution. In addition, we show that the composition property is guaranteed to hold whenever the dependence relationships in the generative distribution can be characterized by paths between singleton elements in some generative graphical model (e.g. a DAG, a chain graph, or a Markov network) even when the generative model includes unobserved variables, and even when the observed data is subject to selection bias.Comment: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002

arXiv.org e-Print Archive

Selective Greedy Equivalence Search: Finding Optimal Bayesian Networks Using a Polynomial Number of Score Evaluations

Author: Chickering David Maxwell
Meek Christopher
Publication venue
Publication date: 05/06/2015
Field of study

We introduce Selective Greedy Equivalence Search (SGES), a restricted version of Greedy Equivalence Search (GES). SGES retains the asymptotic correctness of GES but, unlike GES, has polynomial performance guarantees. In particular, we show that when data are sampled independently from a distribution that is perfect with respect to a DAG

{\cal G}

defined over the observable variables then, in the limit of large data, SGES will identify

{\cal G}

's equivalence class after a number of score evaluations that is (1) polynomial in the number of nodes and (2) exponential in various complexity measures including maximum-number-of-parents, maximum-clique-size, and a new measure called {\em v-width} that is at least as small as---and potentially much smaller than---the other two. More generally, we show that for any hereditary and equivalence-invariant property

\Pi

known to hold in

{\cal G}

, we retain the large-sample optimality guarantees of GES even if we ignore any GES deletion operator during the backward phase that results in a state for which

\Pi

does not hold in the common-descendants subgraph.Comment: Full version of UAI pape

arXiv.org e-Print Archive

Large-Sample Learning of Bayesian Networks is NP-Hard

Author: Chickering David Maxwell
Heckerman David
Meek Christopher
Publication venue
Publication date: 19/10/2012
Field of study

In this paper, we provide new complexity results for algorithms that learn discrete-variable Bayesian networks from data. Our results apply whenever the learning algorithm uses a scoring criterion that favors the simplest model able to represent the generative distribution exactly. Our results therefore hold whenever the learning algorithm uses a consistent scoring criterion and is applied to a sufficiently large dataset. We show that identifying high-scoring structures is hard, even when we are given an independence oracle, an inference oracle, and/or an information oracle. Our negative results also apply to the learning of discrete-variable Bayesian networks in which each node has at most k parents, for all k > 3.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003

arXiv.org e-Print Archive

Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Author: Chickering David Maxwell
Geiger Dan
Heckerman David
Publication venue
Publication date: 16/05/2015
Field of study

We describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric. Our contributions are threefold. First, we identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simplify the encoding of a user's prior knowledge. In particular, a user can express her knowledge-for the most part-as a single prior Bayesian network for the domain. Second, we describe local search and annealing algorithms to be used in conjunction with scoring metrics. In the special case where each node has at most one parent, we show that heuristic search can be replaced with a polynomial algorithm to identify the networks with the highest score. Third, we describe a methodology for evaluating Bayesian-network learning algorithms. We apply this approach to a comparison of metrics and search procedures.Comment: Appears in Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (UAI1994

arXiv.org e-Print Archive

A Bayesian Approach to Learning Bayesian Networks with Local Structure

Author: Chickering David Maxwell
Heckerman David
Meek Christopher
Publication venue
Publication date: 16/05/2015
Field of study

Recently several researchers have investigated techniques for using data to learn Bayesian networks containing compact representations for the conditional probability distributions (CPDs) stored at each node. The majority of this work has concentrated on using decision-tree representations for the CPDs. In addition, researchers typically apply non-Bayesian (or asymptotically Bayesian) scoring functions such as MDL to evaluate the goodness-of-fit of networks to the data. In this paper we investigate a Bayesian approach to learning Bayesian networks that contain the more general decision-graph representations of the CPDs. First, we describe how to evaluate the posterior probability that is, the Bayesian score of such a network, given a database of observed cases. Second, we describe various search spaces that can be used, in conjunction with a scoring function and a search procedure, to identify one or more high-scoring networks. Finally, we present an experimental evaluation of the search spaces, using a greedy algorithm and a Bayesian scoring function.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI1997

arXiv.org e-Print Archive